Unicode in Microsoft Windows

Microsoft started to consistently implement Unicode in their products quite early. Windows NT was the first operating system that used Unicode in system calls. Using at first UCS-2 encoding scheme, it was upgraded to UTF-16 starting with Windows 2000, allowing a representation of additional planes with surrogate pairs.

Contents

In various Windows families

Windows NT based systems

Modern operating systems Windows XP and Windows Server 2003, and prior to them as Windows NT 4 and Windows 2000 are shipped with the system libraries, which supported string encoding of both types: Unicode and current code page, still incorrectly referred to as ANSI code page. Unicode functions have names suffixed with -W (from "wide"), for example, lstrlenW(). Code page oriented functions uses suffix -A, e.g., lstrlenA(). This allows Windows NT OS family simultaneously run programs capable of using Unicode, and older, 8-bit encoding programs. Most of such ANSI-functions are implemented as a wrapper over the corresponding Unicode functions.

The IsTextUnicode function uses an heuristic algorithm on a byte string passed to it to detect whether this string represents an Unicode text. For very short texts, this function, used by some applications like Notepad, often gives incorrect results. This gave rise to legends about the existence of "Easter eggs" like Bush hid the facts.

Windows CE

In Windows CE UTF-16 was used almost exclusively.

Windows 9x

In 2001, Microsoft released a special supplement to Microsoft’s old Windows 9x systems. It includes a dynamic link library unicows.dll (only 240 KB) containing the Unicode flavor (the ones with the letter W on the end) of all the basic functions of Windows API.

Various encoding schemes

Although Windows used the UTF-16LE encoding scheme internally, in NTFS file system, in executables and sometimes in text files, Unicode's byte oriented encodings UTF-8 and even UTF-7 are supported as well. An application which has to support UTF-8 or UTF-7 by the means of Windows API should, paradoxically, call the same functions MultiByteToWideChar and WideCharToMultiByte used to support "legacy" (i.e. pre-Unicode) code pages.[1] Many applications imminently have to support UTF-8 because it is the most used of Unicode encoding schemes in various network protocols, including the Internet Protocol Suite.

  1. ^ "UTF-8 in Windows". Stack Overflow. http://stackoverflow.com/questions/166503/utf-8-in-windows. Retrieved July 1, 2011. 

External links